Corpus: bos_wikipedia_2021_300K

Other corpora

4.4.1.5 Number of Word-N-grams at Sentence Endings

Number of word-N-grams for N=1...5 for the first K sentences

K # of words # of bigrams # of trigrams # of 4-grams # of 5-grams
100 93 98 99 99 99
1000 867 982 995 996 996
10000 6962 9526 9869 9934 9955
100000 42477 85472 96215 98709 99311
1000000 92574 232574 280310 293476 296866


Zipf's diagram for sentence endings


Gnuplot diagram

17438 msec needed at 2024-10-01 14:11